Reinforcement learning
by daniel-hromada
()
@


Repetitio

Experiential learning, Unsupervised learning, Supervised learning, Classifiers & Machine Learning ...

From supervised to reinforcement learning

Supervised learning resembles a structured classroom environment, where explicit feedback is given for each example (e.g., a teacher correcting a student's answers). In contrast, reinforcement learning mirrors experiential learning, where feedback comes as rewards or penalties after actions, guiding behavior toward long-term goals. For instance, a child learning to ride a bike might fall (penalty) or stay balanced (reward), gradually improving through trial and error.

Agent-Environment Framework

In machines, reinforcement learning (RL) is implemented using an agent-environment framework. The agent interacts with an environment by taking actions based on a policy (a strategy for decision-making). The environment provides feedback in the form of rewards or penalties, guiding the agent to improve its actions. Key components include a reward function to evaluate outcomes, a value function to estimate long-term benefits of actions, and exploration strategies to balance learning new behaviors versus exploiting known rewards.

Zitat

“It is unworthy of excellent men to lose hours like slaves in the labour of calculation which could safely be relegated to anyone else if machines were used."

G.W. Leibniz (Describing, in 1685, the value to astronomers of the hand-cranked calculating machine he had invented in 1673.)

Law of Effect

When satisfaction follows association, it is more likely to be repeated.

Q-learning

Q-learning is a model-free reinforcement learning algorithm that enables an agent to learn an optimal policy for decision-making. It works by estimating the Q-values (action-value function), which represent the expected cumulative reward for taking an action in a given state and following the best future actions. The agent updates Q-values iteratively using the formula:

Deep reinforcement learning

DRL is a type of machine learning where an agent learns to make decisions by trial and error, guided by rewards or penalties, using deep neural networks. Unlike traditional methods, which struggle with complex environments, DRL allows machines to learn directly from raw data, like images or game screens. The neural network helps the agent recognize patterns and improve its decisions over time. DRL has achieved impressive results in tasks like playing video games (e.g., Atari, AlphaGo), controlling robots, and developing self-driving cars, making it a powerful tool for solving real-world problems involving sequential decision-making

AlphaGO

In 2016, AlphaGo stunned the world by defeating Go champion Lee Sedol, proving that AI could outthink humans in one of the most complex games ever. Using deep learning and Monte Carlo Tree Search, it played moves no human dared—showcasing creativity, brilliance, and the unsettling realization that humanity might be screwed.

Hebb's Law

"Cells that fire together, wire together."

Explanation

When two neurons in the brain activate at the same time repeatedly, their connection strengthens. This makes it easier and more probable for one to trigger the other in the future.

Art Analogy

Imagine practicing a particular brushstroke over and over. Each time, your hand and brain coordinate, and with practice, the connection becomes stronger and the stroke becomes smoother. Similarly, Hebb’s law underpins how practice makes perfect.

XMAS

Es begab sich aber zu der Zeit, daß ein Gebot von dem Kaiser Augustus ausging, daß alle Welt geschätzt würde.

Und diese Schätzung war die allererste und geschah zu der Zeit, da Cyrenius Landpfleger von Syrien war.

Und jedermann ging, daß er sich schätzen ließe, ein jeglicher in seine Stadt.

Da machte sich auch auf Joseph aus Galiläa, aus der Stadt Nazareth, in das jüdische Land zur Stadt Davids, die da heißt Bethlehem, darum daß er von dem Hause und Geschlechte Davids war, auf daß er sich schätzen ließe mit Maria, seinem vertrauten Weibe, die ward schwanger ...